chore: ps example. #327

k82cn · 2026-01-23T14:16:07Z

No description provided.

Signed-off-by: Klaus Ma <klausm@nvidia.com>

gemini-code-assist · 2026-01-23T14:16:34Z

Summary of Changes

Hello @k82cn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, comprehensive example showcasing a Parameter Server architecture for distributed machine learning, built upon the flamepy.rl.Runner framework. Concurrently, it refines the core application configuration by making the working_directory field optional, providing greater flexibility in how applications specify their execution environment. Additionally, the Python SDK's package management has been made more robust, ensuring reliable setup for Python-based applications.

Highlights

Refactored working_directory to be Optional: The working_directory field in Application and ApplicationAttributes structs in common/src/apis.rs has been changed from a mandatory String to an Option<String>, allowing applications to specify an optional working directory.
Updated Default Working Directory Handling: The default value for working_directory in ApplicationAttributes is now None, and the application execution logic in executor_manager/src/shims/host_shim.rs has been updated to use an executor-specific directory if no working directory is provided.
Introduced Parameter Server Example: A new example (examples/ps/) has been added, demonstrating a synchronous Parameter Server training setup using flamepy.rl.Runner with PyTorch for an MNIST-like task.
Enhanced Python Package Installation: The Python SDK's _extract_archive and _install_package_from_url methods in sdk/python/src/flamepy/rl/runpy.py have been improved to ensure clean extraction by removing old directories and to use pip install --upgrade for more robust package updates, along with added debug logging.
Updated Test Cases and Data Mappings: Various test cases and data mapping implementations across session_manager and common crates have been adjusted to align with the new Option<String> type for working_directory.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a parameter server example, which is a great addition. It also refactors the working_directory attribute for applications to be optional, providing a more sensible default when it's not specified. Additionally, there are some nice improvements to the Python SDK's package installation logic.

My review includes a few suggestions to improve the new example code's robustness, such as handling file paths correctly and ensuring variables are initialized before use. I also found a small improvement for file handling in the Python SDK. Overall, the changes are well-done.

gemini-code-assist · 2026-01-23T14:19:42Z

examples/ps/main.py

+                accuracy = evaluate(model, test_loader)
+                print("Iter {}: \taccuracy is {:.1f}".format(i, accuracy))
+
+        print("Final accuracy is {:.1f}.".format(accuracy))


The accuracy variable is only defined inside the if i % 10 == 0: block within the loop. If the loop range were smaller (e.g., range(0)), this line would raise an UnboundLocalError. It would be safer to initialize accuracy to a default value (e.g., 0.0) before the loop.

gemini-code-assist · 2026-01-23T14:19:42Z

examples/ps/ps.py

+    with FileLock(os.path.expanduser("~/data.lock")):
+        train_loader = torch.utils.data.DataLoader(
+            datasets.MNIST(
+                "~/data", train=True, download=True, transform=mnist_transforms
+            ),
+            batch_size=128,
+            shuffle=True,
+        )
+        test_loader = torch.utils.data.DataLoader(
+            datasets.MNIST("~/data", train=False, transform=mnist_transforms),
+            batch_size=128,
+            shuffle=True,
+        )


The path ~/data is not guaranteed to be expanded by torchvision.datasets.MNIST. It's better to use os.path.expanduser explicitly to ensure the path is resolved correctly. Also, it's good practice to place the lock file within the directory it's protecting to keep things organized. I've refactored this part to define data_dir once and use it for both the dataset and the lock file.

Suggested change

with FileLock(os.path.expanduser("~/data.lock")):

train_loader = torch.utils.data.DataLoader(

datasets.MNIST(

"~/data", train=True, download=True, transform=mnist_transforms

),

batch_size=128,

shuffle=True,

)

test_loader = torch.utils.data.DataLoader(

datasets.MNIST("~/data", train=False, transform=mnist_transforms),

batch_size=128,

shuffle=True,

)

data_dir = os.path.expanduser("~/data")

# We add FileLock here because multiple workers will want to

# download data, and this may cause overwrites since

# DataLoader is not threadsafe.

with FileLock(os.path.join(data_dir, ".lock")):

train_loader = torch.utils.data.DataLoader(

datasets.MNIST(

data_dir, train=True, download=True, transform=mnist_transforms

),

batch_size=128,

shuffle=True,

)

test_loader = torch.utils.data.DataLoader(

datasets.MNIST(data_dir, train=False, transform=mnist_transforms),

batch_size=128,

shuffle=True,

)

gemini-code-assist · 2026-01-23T14:19:43Z

sdk/python/src/flamepy/rl/runpy.py

+                # Check for pyproject.toml or setup.py
+                if "pyproject.toml" in contents:
+                    pyproject_path = os.path.join(install_path, "pyproject.toml")
+                    with open(pyproject_path, "r") as f:


It's a good practice to explicitly specify the file encoding when opening text files. This avoids potential UnicodeDecodeError if the file contains non-ASCII characters and the system's default encoding is not UTF-8. Please add encoding="utf-8".

Suggested change

with open(pyproject_path, "r") as f:

with open(pyproject_path, "r", encoding="utf-8") as f:

Signed-off-by: Klaus Ma <klausm@nvidia.com>

chore: ps example.

5fa2960

Signed-off-by: Klaus Ma <klausm@nvidia.com>

gemini-code-assist bot reviewed Jan 23, 2026

View reviewed changes

k82cn added 2 commits January 24, 2026 01:23

fix build error.

ec0eff9

Signed-off-by: Klaus Ma <klausm@nvidia.com>

fix working_dir empty issue.

88e0189

Signed-off-by: Klaus Ma <klausm@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: ps example. #327

chore: ps example. #327

Uh oh!

k82cn commented Jan 23, 2026

Uh oh!

gemini-code-assist bot commented Jan 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 23, 2026

Uh oh!

gemini-code-assist bot Jan 23, 2026

Uh oh!

gemini-code-assist bot Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	with open(pyproject_path, "r") as f:
	with open(pyproject_path, "r", encoding="utf-8") as f:

chore: ps example. #327

Are you sure you want to change the base?

chore: ps example. #327

Uh oh!

Conversation

k82cn commented Jan 23, 2026

Uh oh!

gemini-code-assist bot commented Jan 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant